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1.  Introduction 

The  extraordinary  development  of  the  digital  computer  has  induced  a  growing 
concern  with  the  difficult  domain  of  neurophysiology  and  the  very  much  more 
inchoate  study  of  thinking  and  thought  processes.  Many  mathematicians,  and 
others,  have  been  attracted  by  the  challenge  of  constructing  computer  programs 
which  will  carry  out  activities  ordinarily  requiring  human  intelligence.  Here  we 
have  in  mind  such  processes  as  pattern  recognition,  musical  composition,  chess 
playing,  and  theorem  proving. 

It  is  of  interest  and  importance  then  to  see  if  it  is  possible  to  define  in  precise 
terms  what  we  mean,  or  even  the  many  different  things  we  could  mean,  by  the 
term  intelligent  machine.  Perhaps  even  more  important  is  to  examine  in  fine 
detail  what  is  involved  in  an  attempt  to  introduce  such  concepts  as  levels  of 
intelligence,  learning,  instinct,  in  such  a  way  as  to  facilitate  reasoned  scientific 
discourse  in  this  area.  As  we  shall  see,  there  are  considerable  difficulties,  and  as 
the  reader  will  soon  note,  we  raise  more  questions  than  we  answer.  These 
questions  do  not  appear  to  be  insurmountable,  but  it  would  appear  that  their 
answers  require  a  level  of  mathematical  sophistication  and  analysis  equivalent 
to  that  required  for  the  theory  of  sets,  the  mathematical  theory  of  logic,  and 
perhaps  most  closely  related  to  that  used  in  the  Liouville  theory  of  the  integra¬ 
tion  of  elementary  functions  in  terms  of  elementary  functions  [1]. 

Our  basic  approach  is  to  imbed  the  concept  of  intelligence  within  the  concept 
of  decision  making.  We  then  consider  various  classes  of  multistage  decision 
processes  to  which  we  attach  certain  familiar  names.  Admittedly,  this  is  a  narrow 
approach,  but  perhaps  precisely  for  this  reason  we  may  be  able  to  obtain  some 
precision. 

2.  Multistage  decision  processes 

Let  p  be  a  point  in  a  space  S,  with  q  a  point  in  a  space  D,  and  T(p,  q)  a  trans¬ 
formation  with  the  property  that  T(p,  q)  E  S  whenever  p  e  S,  q  e  D.  Call  p  the 
state  vector,  and  q  the  decision  vector  (see  [2]).  Consider  a  sequence  of  points 
in  S  generated  in  the  following  fashion 

(2.1)  pi  =  T(p,  qi ),  p2  =  T(ph  q2),  •  •  •  ,  pn  =  T(pn- 1,  qn), 
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The  decision  qn  that  is  made  at  the  nth  stage  will  in  general  depend  on  the  past 
history  of  the  process, 

(2.2)  qn  =  f(p,  pi,  •  •  •  ,  pn-i;  qi,  92,  •  •  •  ,  qn- i). 

Our  aim  is  to  delineate  various  classes  of  decision  processes  by  recognizing 
distinct  ways  in  which  this  dependence  can  hold.  A  natural  way  to  begin  is  with 
the  case  where  the  decision  at  time  n  is  a  function  only  of  the  state  at  time  n, 

(2.3)  qn  =  h(pn). 

This  can  be  regarded  as  a  mathematical  model  of  a  class  of  phenomena  lumped 
under  the  heading  “instinct.”  A  particular  stimulus  produces  a  certain  effect. 
If,  however,  we  allow  the  full  dependence  of  (2.2),  the  case  where  the  decision 
depends  upon  a  knowledge  and  use  of  past  events,  we  can  regard  it  as  a  model  of 
the  class  of  phenomena  labeled  “learning  and  adaptation.” 

There  are  several  caveats  that  must  be  uttered  immediately.  Suppose  that  we 
redefine  our  process  by  introducing  the  new  state  vector 


(2.4) 

7T«  =  (P,  Pi,***  ,  Pn,  Ql,  92,  ’  • 

,  9n). 

Then, 

(2.5) 

=  Ti(7r»_i,  §n)> 

qn  =  <K  7T„) 

This  has  the  same  form  as  (2.3) .  How  now  do  we  distinguish  between  an  instinc¬ 
tive  and  a  learning  process? 

This  is,  of  course,  a  familiar  question  in  the  theory  of  stochastic  processes 
where  the  terms  Markovian  and  non-Markovian  are  used  loosely.  The  point  we 
wish  to  raise  is  that  it  is  no  easy  matter  to  make  this  type  of  labeling  precise. 
We  would  prefer  to  stay  away  from  the  questions  of  detailed  analytic  structure 
and  dimensionality,  but  it  may  be  that  this  is  not  possible  if  we  wish  a  useful 
taxonomy. 

Furthermore,  is  it  possible  that  the  foregoing  is  not  merely  a  mathematical 
device,  but  actually  represents  the  possibility  that  learning  itself  may  be  an 
instinctive  phenomenon?  This  type  of  question  arises  again  below. 

A  second  point,  equally  as  important  as  the  first,  is  that  it  is  most  certainly 
unwise  at  this  point  to  attach  ordinary  words  such  as  instinctive  and  learning 
with  a  fuzzy  cloud  of  intuitive  associations  to  specific  mathematical  models  of 
rather  simple  structure.  We  know  full  well  that  these  models  are  not  representa¬ 
tive  of  the  entire  set  of  responses  connected  with  the  phenomena  of  instinct  and 
learning.  Examples  of  this  narrow  use  of  important  terms  are  information  theory, 
decision  theory,  learning  theory,  and  a  number  of  others  can  be  given.  It  is  for 
this  reason  that  we  prefer  the  term  adaptive  which  is  not  as  much  a  part  of  the 
usual  vocabulary. 

3.  Adaptive  processes 

In  order  to  construct  useful  models  of  processes  involving  learning  and 
adaptation  for  purposes  of  simulation  and  analytic  study,  it  is  necessary  to 
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consider  specific  forms  of  the  dependence  upon  the  past.  A  very  useful  mathe¬ 
matical  model  is  the  following.  We  begin  by  replacing  (2.3)  by 

(3.1)  qn  =  h{pn,  a ), 

where  a  is  a  vector  parameter.  We  are  now  talking  about  a  family  of  responses 
with  the  individual  member  of  the  family  specified  by  a  particular  value  of  a. 

To  introduce  the  adaptive  aspects,  let  an  be  made  a  function  of  the  history  of 
the  process, 

(3.2)  an  —  f  (p,  pi,  •  •  •  ,  pn ;  qh  q2,  •  •  •  ,  qn-i,  oi,  a2,  •  •  •  ,  an_i). 

Then, 

(3.3)  qn  h(pn— 1,  On). 

Processes  of  this  type  can  be  taken  to  represent  certain  simple  types  of  learning 
processes.  The  question  of  how  an  should  be  determined  on  the  basis  of  the  past 
history  of  the  process  is  a  difficult  one  which  can  be  discussed  by  means  of  a 
number  of  mathematical  theories  ranging  from  decision  theory  and  dynamic 
programming  to  nonlinear  prediction  theory  and  quasilinearization  [2],  [3]. 

4.  Policies  and  policies  for  determining  policies 

We  began  with  a  rule  for  making  decisions,  (3.1).  Let  us  call  this  rule  a  policy. 
Then  we  formulated  a  rule  for  modifying  policies  on  the  basis  of  experience.  This 
is  a  policy  for  producing  policies.  Can  we  go  one  step  further?  How  do  we  modify 
these  metapolicies?  One  approach  is  to  mimic  what  we  did  before.  Write 

pn  =  T(pn- 1,  qn), 

(4.1)  qn  =  h(pn,  an), 

On  f  (jP)  Pi)  )  Pn )  qi)  q%)  )  qn— 1,  ®1,  ®2,  f  ®n— 1,  &), 

where  b  is  again  a  vector  parameter  which  we  can  make  dependent  upon  the 
history  of  the  process. 

A  policy  for  modifying  policies  is  now  a  prescription  for  the  determination  of  b 
at  each  stage  as  a  function  of  the  past  history  of  the  process 

(4.2)  bn  ^(jPj  Pi)  )  Pn)  qi)  §2,  )  qn — 1,  Oil)  02)  )  On — 1,  &1,  &2>  ,  bn — l)- 

But  now  we  encounter  a  curious  difficulty.  How  do  we  differentiate  between  these 
last  two  types  of  learning  processes?  In  both  cases,  we  end  up  with  a  policy  of 
the  form 

(4.3)  qn  =  f(p,  Pi,  •  •  •  ,  pn- 1;  qi,  92,  •  •  •  ,  qn- 1). 

We  know  that  we  have  iterated  policies,  and  we  feel  intuitively  that  we  are 
operating  on  a  higher  level  of  intelligence.  In  terms  of  our  initial  definition, 
however,  we  appear  never  to  be  able  to  rise  above  a  learning  process  of  the 
simplest  type. 

The  difficulty  may  lie  in  the  generality  we  have  attempted.  If  we  deal  with 
general  functions,  there  is  no  way  of  distinguishing  between  an  arbitrary  function 
and  the  iterate  of  an  arbitrary  function.  If,  on  the  other  hand,  we  consider  special 
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classes  of  functions,  then  we  can  construct  a  meaningful  hierarchy.  This  is  the 
basic  idea  of  the  Liouville  theory  of  the  integration  of  elementary  functions  [1]. 

5.  An  alternative  approach 

In  place  of  pursuing  the  foregoing  approach,  we  can  assume  that  a  policy  for 
modifying  policies  is  dependent  not  only  upon  the  result  of  a  single  history,  but 
upon  the  observation  of  a  number  N  of  histories.  Thus,  we  suppose  that  we  begin 
with  the  processes 

P(nk)  =  T(p(nli,  qn)), 

(5.1)  q<*'  =  h(p(nk),a «>), 

a(nk)  =  f(p(k),  •  •  •  ,  Pnk);  qi  \  •  •  ■  ,  qhk-i;  a[k),  •  •  •  ,  a(nk);  bn), 
where  k  =  1,2,  •  •  •  ,  Ni. 

Now  let  bn  be  chosen  as  a  function  of  the  histories  of  all  the  processes 

(5.2)  bn  =  g(p(k),  q(k\  aik);  i  =  1,  2,  •  •  •  ,  n,  bh  b2,  •  •  •  ,  6»_i). 

This  yields  an  adaptive  process  which  is  on  a  higher  level  than  the  simple 
learning  process  described  above.  We  can  now  enlarge  this  process  in  a  similar 
fashion  and  thus  obtain  the  desired  hierarchy  of  adaptive  processes. 

6.  Discussion 

A  number  of  questions  arise  immediately.  Are  there  other  types  of  hierarchies? 
One  would  imagine  so.  For  example,  do  we  have  a  genuine  model  of  creativity 
in  the  foregoing? 

A  second  problem  of  importance  in  a  number  of  fields  is  the  inverse  problem. 
Given  a  description  of  a  type  of  behavior  which  is  instinctive  or  intelligent,  can 
we  explain  it  in  terms  of  a  mathematical  model  of  the  foregoing  type?  This  is 
the  kind  of  problem  studied  in  [4]. 

With  the  construction  of  various  theories  of  the  type  proposed  above,  we  can 
begin  to  discuss  the  question,  “Can  machines  think?”  in  a  rational  fashion.  We 
would  first  modify  it  to  read,  “Can  machines  perform  level  k  thinking?”  This  is 
now  a  definite  question  which  can  be  answered  affirmatively  by  a  computer 
program,  or  an  existence  proof  for  a  computer  program. 

This  is  the  traditional  approach  used  in  mathematics  to  remove  mysticism 
from  the  unknown,  the  approach  followed  in  the  study  of  the  infinite,  in  the 
study  of  logical  statements,  in  the  study  of  divergent  series,  the  Liouville  theory 
mentioned  above,  and  so  on. 
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